This week we're going to explore Markov Chain Monte-Carlo (commonly referred to as MCMC) methods, which are techniques for sampling based on random walks. In other words, MCMC is a class of techniques for sampling from a probability distribution and can be used to estimate the distribution of parameters given a set of observations.
After this session you will
Keywords: random walks — bayesian inference — markov chain monte-carlo: sampling, metropolis algorithm
The Metropolis-Hastings algorithm is a technique to sample from high dimensional, difficult to sample distributions or functions. It is often used for cases where direct sampling is not feasible due to intractable integrals. It's a MCMC algorithm - Markov Chain because the next sample $x'$ only relies on the current sample $x$, and Monte Carlo because it generates a random sample which can be used to e.g., compute integrals.
The key ingredient in this technique lies in the distribution $g(x \rightarrow x')$, which is used to compute the next candidate of the Markov Chain from a given state. From this, the acceptance probability $\alpha$ is calculated. It is the probability that is used to decide whether the new sample is accepted or not, and is given by $\alpha = min(1, \frac{f(x')}{f(x)} \frac{g(x' \rightarrow x)}{g(x \rightarrow x')})$, where $f(x)$ is a function that is proportional to the target distribution $P(x)$ we want to sample from. If the transition distribution is symmetric such that $g(x \rightarrow x') = g(x' \rightarrow x)$, then the algorithm is just called Metroplis, and the second factor in the equation for $\alpha$ is unity. This is true, for example, if $g$ is Gaussian.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
%matplotlib inline
# to-do: implement the equation of a circle as a function
def circle(x, y, M_x=1, M_y=2, R=3):
return (x - M_x)**2 + (y - M_y)**2 - R**2
The steps of the Metropolis algorithm are:
1. Pick a starting point $(x, y)$.
2. Calculate $f(x, y)$.
3. Draw a proposal parameter $(x', y')$ from a symmetric distribution.
4. Calculate $f(x', y')$.
5. Accept the suggested parameter with probability $\alpha = min(1, \frac{f(x', y')}{f(x, y)})$
6. If accepted, set $(x, y) = (x', y')$ and repeat from 3 for the desired number of iterations.
# to-do: complete the Metropolis algorithm that samples points from within the circle
def metropolis(f, iterations=1000):
''' Metropolis algorithm.
Args:
f (function): Function that is proportional
to the target distribution.
iterations (int): Number of iterations
Returns:
samples: Array filled with the drawn samples.
'''
# initialize samples array
samples = np.zeros((iterations, 2))
# first sample
x, y = 0., 0.
# metropolis steps
for i in range(iterations):
x_prime, y_prime = np.array([x, y]) + np.random.normal(size=2)
# random sampling for random choice of acceptance w/ prob = alpha
if np.random.rand() < min(1, f(x_prime, y_prime) / f(x, y)):
# accept
x, y = x_prime, y_prime
# add to samples
samples[i] = np.array([x, y])
return samples
# to-do: use metropolis to sample from a circle and plot the samples
metro_samples = metropolis(f=circle,iterations=10**5)
## does not alway work w/ pts outside the circle
# plot
sns.jointplot(metro_samples[:, 0], metro_samples[:, 1])
C:\Users\adelu\AppData\Local\Programs\Python\Python39\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
<seaborn.axisgrid.JointGrid at 0x253021f9400>
# to-do: implement the multivariate Gaussian as a function
# Hint: use scipy.stats.multivariate_normal.pdf
# and specify a mu (mean vector) and sigma (covariance matrix)
def multi_gauss(x, y, mu=np.array([5, 5]), cov_mat=np.array([[1., .9], [.9, 1.]])):
return scipy.stats.multivariate_normal.pdf([x, y], mean=mu, cov=cov_mat)
# to-do: use metropolis to sample from a the multivariate gaussian and plot the samples
samples = metropolis(multi_gauss, iterations=10**5)
# plot
sns.jointplot(samples[:, 0], samples[:, 1])
C:\Users\adelu\AppData\Local\Programs\Python\Python39\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
<seaborn.axisgrid.JointGrid at 0x253059877f0>
# to-do: calculate the mean and the covariance matrix of the sample
# and compare it to the actual distribution
mean = np.array([np.mean(samples[:, 0]), np.mean(samples[:, 1])])
cov = np.cov(samples.T)
print('Calculated mean: ', np.round(mean, 2))
print('Calculated covariance: ', np.round(cov, 2))
Calculated mean: [4.99 4.99] Calculated covariance: [[1.02 0.92] [0.92 1.01]]
In your daily life as a data scientist, you might not want to implement mcmc-sampling yourself. Luckily, the pymc3 module has all the functionalities we might want to perform mcmc. What follows is the general usage of the module presented with an example of fitting a straight line using parameter estimates coming from an mcmc-algorithm.
In the lecture Bayes' theorem was introduced. We can write it in a slightly shorter version as
$$p(\theta|x, y) \propto p(\theta)\cdot p(x, y|\theta)$$i.e. the posterior distribution of parameters $\theta$ is proportional to the prior and the likelihood of our data sample. We can use this formalism to solve a linear regression problems. Let us assume our fitting parameters are $\theta = (q, s, \sigma)$ where $q$ is the intercept, $s$ the slope and $\sigma$ the intrinsic scatter of $y$.
Furthermore let us assume that our observed $y$ is normally distributed around the true regression relation.
The likelihood of our datasample is then easily computed (see lecture notes).
The prior function encapsulates all information we believe to have about the fitting parameters before having seen any data! E.g. these can be realistic ranges within which we expect our parameters to be.
What we want is to sample from the posterior distribution (which in real world examples can be very complicated!). Once we have a good idea how this distribution looks for each parameter we can compute the expected parameter fitting values (mean) as well as the error (standard deviation) on them. Let's see how we can do this in code.
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt
import scipy
intercept_true = 0.5
slope_true = 0.5
sigma_true = 0.3
nobs = 100
x_true = np.sort(5 * np.random.rand(nobs))
y_isc = sigma_true * scipy.stats.multivariate_normal.rvs(size=nobs)
y = slope_true * x_true + intercept_true + y_isc
plt.scatter(x_true, y, marker='.')
xx = np.linspace(x_true.min(), x_true.max(), 100)
true_relation = intercept_true + xx * slope_true
plt.plot(xx, true_relation, 'k--')
plt.show()
WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain` WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string. WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
with pm.Model() as linear_model:
prior_intercept = pm.Normal('Intercept', mu=0, sd=1)
prior_slope = pm.Normal('slope', mu=0, sd=1)
prior_sigma = pm.HalfNormal('sigma', sd=0.5)
prior_mean = prior_intercept + prior_slope * x_true
Y_obs = pm.Normal('Y_obs', mu=prior_mean, sd=prior_sigma, observed=y)
chain = pm.Metropolis()
linear_trace = pm.sample(500, chain)
<ipython-input-3-c375b34b03c6>:10: FutureWarning: In v4.0, pm.sample will return an `arviz.InferenceData` object instead of a `MultiTrace` by default. You can pass return_inferencedata=True or return_inferencedata=False to be safe and silence this warning. linear_trace = pm.sample(500, chain) Multiprocess sampling (4 chains in 4 jobs) CompoundStep >Metropolis: [sigma] >Metropolis: [slope] >Metropolis: [Intercept]
Sampling 4 chains for 1_000 tune and 500 draw iterations (4_000 + 2_000 draws total) took 41 seconds. The rhat statistic is larger than 1.2 for some parameters. The estimated number of effective samples is smaller than 200 for some parameters.
pm.traceplot(linear_trace, figsize = (12, 12))
<ipython-input-4-d4aca2f3bfb2>:1: DeprecationWarning: The function `traceplot` from PyMC3 is just an alias for `plot_trace` from ArviZ. Please switch to `pymc3.plot_trace` or `arviz.plot_trace`. pm.traceplot(linear_trace, figsize = (12, 12)) c:\users\adelu\appdata\local\programs\python\python39\lib\site-packages\arviz\data\io_pymc3.py:96: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context. warnings.warn(
array([[<AxesSubplot:title={'center':'Intercept'}>,
<AxesSubplot:title={'center':'Intercept'}>],
[<AxesSubplot:title={'center':'slope'}>,
<AxesSubplot:title={'center':'slope'}>],
[<AxesSubplot:title={'center':'sigma'}>,
<AxesSubplot:title={'center':'sigma'}>]], dtype=object)
intercepts = linear_trace['Intercept']
slopes = linear_trace['slope']
sigmas = linear_trace['sigma']
print('Intercept: {0:.3f} ± {1:.3f}'.format(intercepts.mean(), intercepts.std()))
print('slope: {0:.3f} ± {1:.3f}'.format(slopes.mean(), slopes.std()))
print('sigma: {0:.3f} ± {1:.3f}'.format(sigmas.mean(), sigmas.std()))
plt.figure()
plt.scatter(x_true, y, marker='.')
xx = np.linspace(x_true.min(), x_true.max(), 100)
true_relation = intercept_true + xx * slope_true
p1, = plt.plot(xx, true_relation, 'k--')
p2, = plt.plot(xx, xx*slopes.mean()+intercepts.mean(), 'r--')
plt.legend([p1, p2], ["true relation", "mcmc fit"])
Intercept: 0.444 ± 0.054 slope: 0.489 ± 0.020 sigma: 0.304 ± 0.024
<matplotlib.legend.Legend at 0x22a5a2bb850>
In this exercise we will apply the methods introduced above to fit a regression line where the parameters are estimated using mcmc-sampling. Furthermore, we will compare the linear fits obtained from the mcmc-sampling to the ones from ordinary linear regression.
import statsmodels.formula.api as sm
import pandas as pd
s_true = -0.9594
q_true = 4.294
N = 100
# generating some sample data
x = np.sort(10 * np.random.rand(N))
yerr = np.random.rand(N)
y = s_true * x + q_true
y += np.abs(y) * np.random.randn(N)
y += yerr * np.random.randn(N)
# to-do: perform mcmc sampling and retrieve the fitting parameters.
with pm.Model() as linear_model:
prior_q = pm.Normal("intercept", mu=0, sd=1)
prior_m = pm.Normal("slope", mu=0, sd=1)
prior_sigma = pm.HalfNormal("sigma", sd=.5)
prior_mean = prior_q + prior_m * x
y_obs = pm.Normal('Y_obs', mu=prior_mean, sd=prior_sigma, observed=y)
chain = pm.Metropolis()
linear_trace = pm.sample(500, chain, cores=1)
# to-do: perform the ols fit
df = pd.DataFrame({"x" : x, "y" : y})
reg = sm.ols(formula="y ~ x", data=df).fit()
# gather the params
mcm_q = linear_trace["intercept"]
mcm_m = linear_trace["slope"]
mcm_sigma = linear_trace["sigma"]
ols_q, ols_m = reg.params
ols_qerr, ols_merr = reg.bse
# print the params
print("mcm parameters:\n")
print("Intercept: {0:.3f} ± {1:.3f}".format(mcm_q.mean(), mcm_q.std()))
print("Slope: {0:.3f} ± {1:.3f}".format(mcm_m.mean(), mcm_m.std()))
print("Sigma: {0:.3f} ± {1:.3f}\n\n".format(mcm_sigma.mean(), mcm_sigma.std()))
print("ols parameters:\n")
print("Intercept: {0:.3f} ± {1:.3f}".format(ols_q, ols_qerr))
print("Slope: {0:.3f} ± {1:.3f}".format(ols_m, ols_merr))
# to-do: visually show the different fits
xx = np.linspace(x.min(), x.max(), num=100)
plt.figure()
plt.scatter(x,y)
p1, = plt.plot(xx, ols_q + ols_m*xx, "b--")
p2, = plt.plot(xx, mcm_q.mean() + mcm_m.mean()*xx, "r--")
plt.legend([p1, p2], ["ols reg", "mcm reg"])
plt.xlabel("x")
plt.ylabel("y")
plt.show()
<ipython-input-8-3ee70cb735da>:11: FutureWarning: In v4.0, pm.sample will return an `arviz.InferenceData` object instead of a `MultiTrace` by default. You can pass return_inferencedata=True or return_inferencedata=False to be safe and silence this warning. linear_trace = pm.sample(500, chain, cores=1) Sequential sampling (2 chains in 1 job) CompoundStep >Metropolis: [sigma] >Metropolis: [slope] >Metropolis: [intercept]
c:\users\adelu\appdata\local\programs\python\python39\lib\site-packages\pymc3\step_methods\metropolis.py:226: RuntimeWarning: overflow encountered in exp "accept": np.exp(accept),
Sampling 2 chains for 1_000 tune and 500 draw iterations (2_000 + 1_000 draws total) took 49 seconds. The rhat statistic is larger than 1.2 for some parameters. The estimated number of effective samples is smaller than 200 for some parameters.
mcm parameters: Intercept: 0.472 ± 0.050 Slope: 0.250 ± 0.007 Sigma: 0.324 ± 0.028 ols parameters: Intercept: 0.441 ± 0.066 Slope: 0.255 ± 0.011